Deep networks have recently enjoyed enormous success when applied torecognition and classification problems in computer vision, but their use ingraphics problems has been limited. In this work, we present a novel deeparchitecture that performs new view synthesis directly from pixels, trainedfrom a large number of posed image sets. In contrast to traditional approacheswhich consist of multiple complex stages of processing, each of which requirecareful tuning and can fail in unexpected ways, our system is trainedend-to-end. The pixels from neighboring views of a scene are presented to thenetwork which then directly produces the pixels of the unseen view. Thebenefits of our approach include generality (we only require posed image setsand can easily apply our method to different domains), and high quality resultson traditionally difficult scenes. We believe this is due to the end-to-endnature of our system which is able to plausibly generate pixels according tocolor, depth, and texture priors learnt automatically from the training data.To verify our method we show that it can convincingly reproduce known testviews from nearby imagery. Additionally we show images rendered from novelviewpoints. To our knowledge, our work is the first to apply deep learning tothe problem of new view synthesis from sets of real-world, natural imagery.
展开▼